docs: adding translation stats to docs #511

RobPasMue · 2025-05-23T12:01:43Z

Adding a dependency to this package with plotly for providing nice stats graphs.

See rendered docs for demonstration but here is a snapshot

RobPasMue · 2025-05-23T12:02:44Z

This PR is related to #493 -- second step declared in #493 (comment)

RobPasMue · 2025-05-23T12:05:18Z

The solution involves creating a Sphinx extension that reads in the JSON data generated, and creates a Plotly graph (with plotlyjs) and embeds it into the docs. Full statistics reports for each module are shown when hovering over the bar (as a function of the locale and module). This uses the hover tooltip to render properly.

Hope you like it! Looking forward to feedback related to it @lwasser!

_ext/translation_graph.py

lwasser · 2025-05-28T16:24:27Z

@RobPasMue this is awesome!! Here is my one question, and then I'll provide a suggestion, but let's see what @flpm and @sneakers-the-rat think! I suspect that because translating.md is NOT actually a part of our pshinx guide, that people won't be able to see the graphic that you made in the page (i could be wrong but it looks like it needs to render plotly).

So as a middle ground, could we do the following

Could we export the plot as a PNG too, so we could drop it in the readme file?
Could we make TRANSLATING.md a part of our Sphinx build and add a contributing tab?

That way the interactive plot gets rendered via Sphinx (and this page is no longer an "orphan".
We can also include a static image in our README file for drive-by contributors on GitHub.

y'all - let me know how that lands!

lwasser · 2025-05-28T16:25:58Z

_ext/translation_graph.py

+            ))
+
+        # Create figure
+        fig = go.Figure(data=traces)


@RobPasMue could we create a grid of plots - one for each language?

then each plot could have 3 bars - one for fuzzy, one for complete and one for incomplete (or it could be stacked bars too.

What you have now is awesome but if we add more languages it will get complex over time. And a static version of the plot would be nice too.

Sure sounds good!

I think a good idea would be a heat map, which is condensed enough we can add many more languages.

agreed on the heatmap. i would expect it oriented with languages as rows and pages as columns (which satisfies the need to expandability to future languages)

lwasser · 2025-05-28T16:27:39Z

_ext/translation_graph.py

+
+        # Create figure
+        fig = go.Figure(data=traces)
+        fig.update_layout(


Could the plot please use our pyOS colors?

Dark Purple: #33205c
Light Purple: #735fab
Pale Purple: #bab3d4
Magenta: #bb82b0
Sea Green: #81c0aa

Most def! =) I'll look into the colors as soon as I can

oh whoops. just saw this comment. probably would be good to use the css variables directly when we can to avoid having them hardcoded in multiple places. i couldn't find a rhyme or reason to when i was able to use css vars in the plotly values and when i needed to declare them in the stylesheet, but ya some examples in this comment

lwasser

This looks so good - i just suggested a few changes. We could add translating, contributing etc to the guidebook as pages in another pr as well if we want to merge this. I just worry that the beautiful work done here won't render until we add these pages to the guide (i could be wrong!).

RobPasMue · 2025-05-29T07:54:50Z

I suspect that because translating.md is NOT actually a part of our pshinx guide, that people won't be able to see the graphic that you made in the page (i could be wrong but it looks like it needs to render plotly).

Hi @lwasser -- just to clarify, the translation page is available in our published docs: https://www.pyopensci.org/python-package-guide/TRANSLATING.html

I think we should just link it properly to the landing page =) so that it is no longer an orphan as you mentioned.

Could we export the plot as a PNG too, so we could drop it in the readme file?

Nonetheless, this is possible if y'all prefer. Although my feeling is that only people interested in the translation would be curious about this information. So, IMO, the best location would be the TRANSLATING.md file. But I'm up for discussion! =)

flpm · 2025-05-29T19:33:37Z

Wow, that's pretty neat! 🤩 I like the idea of having an interactive visualization that people can consult and the Translation guide feels like a natural place.

We should add a link to the live version on the site inside TRANSLATING.md, I imagine most users will look at the md in their own clones or in GitHub and they will not see the chart at first. The link will be a quick way to get them to the real data.

On the visualization itself: I am not sure the bar chart is the best approach to show this data. In my head it feels more natural to imagine it as a heat map, where the rows are the files, the columns are the languages.

In the heat map each cell would show the % complete and be colored with the proper intensity. I think the main advantage of a heat map is that empty cells are clearly defined and will be easier to spot than missing bars.

I would also include English (hard coded at 100% in all cells), as a first column. And order the languages from most done to least done (currently, JA then ES).

But I have never used plotly so I am not sure how much work that would be, we could always have that as a future improvement in a separate issue.

flpm · 2025-05-29T19:34:25Z

TRANSLATING.md

+
+```{translation-graph}
+```
+


Maybe here would be a good spot to include the link to the site

seems like we would want both directions? from the contributing page here and vice versa?

flpm · 2025-05-29T19:36:34Z

_ext/translation_graph.py

+            ))
+
+        # Create figure
+        fig = go.Figure(data=traces)


I think a good idea would be a heat map, which is condensed enough we can add many more languages.

lwasser · 2025-05-29T21:20:21Z

https://www.pyopensci.org/python-package-guide/TRANSLATING.html

My apologies, @RobPasMue you are correct, i saw that it was flagged orphan but now i realize it's just a matter of adding a link to it from the guide somewhere.

Please ignore my comment. I'll defer to @flpm for what the final plots look like!! I do want the ability to add more languages in the future. And the ability for users to easily identify gaps and determine where they can contribute (without needing to run an A nox session).

Let's add a link to the translating page in a separate PR so you don't have to worry about it here and can focus on the data viz challenge!!

lwasser · 2025-05-29T21:21:51Z

and here is a preview from circleci!!

sneakers-the-rat

cool, ya nice, i needed a distraction today so i spent some time on the plot. haven't used plotly since it came out and whew somehow it became three different packages or something? anyway it's very slick. put my version of the plot in suggestion.

sneakers-the-rat · 2025-05-29T22:56:42Z

TRANSLATING.md

+
+```{translation-graph}
+```
+


seems like we would want both directions? from the contributing page here and vice versa?

sneakers-the-rat · 2025-05-29T23:09:53Z

_ext/translation_graph.py

+
+    def run(self):
+        # Read the JSON file containing translation statistics
+        json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json"


i missed the PR that added the translation_stats script, but i think that this will go out of date almost immediately and become a misleading indicator if we don't generate this during the build process so that at the time the docs are generated the plot and the stats both reflect the same state of the repo.

I personally avoid committing generated data files that only need to exist at deployment time because they make PRs noisy and tempt us to treat them as files we can edit, but if we are to keep it there, we should add it and trigger it from some early build event like builder-inited - see the _post_build and setup functions at the bottom of conf.py

Hi @sneakers-the-rat! Yeah that makes sense - I could move the generation of the JSON file into a build_event. That sounds reasonable.

it took me many years and many attempts of trying before the hook pattern of sphinx sunk in for me, the way you did it was completely understandable

_ext/translation_graph.py

sneakers-the-rat · 2025-05-30T01:14:29Z

_ext/translation_graph.py

+
+    def run(self):
+        # Read the JSON file containing translation statistics
+        json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json"
+        with json_path.open("r") as f:
+            data = json.load(f)
+
+        # Collect all module names -- iterates over the JSON data in 2 levels
+        all_modules = {module for stats in data.values() for module in stats}
+        all_modules = sorted(all_modules)
+
+        # Build one trace per locale with full hover info
+        traces = []
+
+        for locale, modules in data.items():
+            y_vals = []
+            hover_texts = []
+
+            for module in all_modules:
+                stats = modules.get(module)
+                y_vals.append(stats["percentage"])
+
+                hover_text = (
+                    f"<b>{module}</b><br>"
+                    f"Translated: {stats['translated']}<br>"
+                    f"Fuzzy: {stats['fuzzy']}<br>"
+                    f"Untranslated: {stats['untranslated']}<br>"
+                    f"Total: {stats['total']}<br>"
+                    f"Completed: {stats['percentage']}%"
+                )
+                hover_texts.append(hover_text)
+
+            traces.append(go.Bar(
+                name=locale,
+                x=all_modules,
+                y=y_vals,
+                hovertext=hover_texts,
+                hoverinfo="text"
+            ))
+
+        # Create figure
+        fig = go.Figure(data=traces)
+        fig.update_layout(
+            barmode="group",
+            title="Translation Coverage by Module and Locale",
+            xaxis_title="Module",
+            yaxis_title="Percentage Translated",
+            height=600,
+            margin=dict(l=40, r=40, t=40, b=40)
+        )
+
+        div = plot(fig, output_type="div", include_plotlyjs=True)
+        return [nodes.raw("", div, format="html")]


Suggested change

def run(self):

# Read the JSON file containing translation statistics

json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json"

with json_path.open("r") as f:

data = json.load(f)

# Collect all module names -- iterates over the JSON data in 2 levels

all_modules = {module for stats in data.values() for module in stats}

all_modules = sorted(all_modules)

# Build one trace per locale with full hover info

traces = []

for locale, modules in data.items():

y_vals = []

hover_texts = []

for module in all_modules:

stats = modules.get(module)

y_vals.append(stats["percentage"])

hover_text = (

f"{module} "

f"Translated: {stats['translated']} "

f"Fuzzy: {stats['fuzzy']} "

f"Untranslated: {stats['untranslated']} "

f"Total: {stats['total']} "

f"Completed: {stats['percentage']}%"

)

hover_texts.append(hover_text)

traces.append(go.Bar(

name=locale,

x=all_modules,

y=y_vals,

hovertext=hover_texts,

hoverinfo="text"

))

# Create figure

fig = go.Figure(data=traces)

fig.update_layout(

barmode="group",

title="Translation Coverage by Module and Locale",

xaxis_title="Module",

yaxis_title="Percentage Translated",

height=600,

margin=dict(l=40, r=40, t=40, b=40)

)

div = plot(fig, output_type="div", include_plotlyjs=True)

return [nodes.raw("", div, format="html")]

# oddly, this is evaluated in the js not python,

# so we treat customdata like a json object

HOVER_TEMPLATE = """

%{customdata.module} 

Translated: %{customdata.translated} 

Fuzzy: %{customdata.fuzzy} 

Untranslated: %{customdata.untranslated} 

Total: %{customdata.total} 

Completed: %{customdata.percentage}%

"""

def run(self):

# Read the JSON file containing translation statistics

json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json"

with json_path.open("r") as f:

data: TranslationStats = json.load(f)

# Sort data by locale and module

data = {locale: dict(sorted(loc_stats.items())) for locale, loc_stats in sorted(data.items())}

# prepend english, everything set to 100%

en = {module: ModuleStats(total=stats['total'], translated=stats['total'], fuzzy=stats['total'], untranslated=0, percentage=100) for module, stats in next(iter(data.values())).items()}

data = {'en': en} | data

# extract data to plot

locales = list(data.keys())

modules = list(data[locales[-1]].keys())

values = [[stats['percentage'] for stats in loc_stats.values()] for loc_stats in data.values()]

hoverdata = [[{'module': module} | stats for module, stats in loc_stats.items()] for loc_stats in data.values()]

heatmap = go.Heatmap(

x =modules,

y=locales,

z=values,

xgap=5,

ygap=5,

customdata=np.array(hoverdata),

hovertemplate=self.HOVER_TEMPLATE,

colorbar={

'orientation': 'h',

'y': 0,

"yanchor": "bottom",

"yref": "container",

"title": "Completion %",

"thickness": 10,

},

colorscale="Plotly3",

)

# Create figure

fig = go.Figure(data=heatmap)

fig.update_layout(

paper_bgcolor="rgba(0,0,0,0)",

plot_bgcolor="rgba(0,0,0,0)",

font_color="var(--bs-body-color)",

margin=dict(l=40, r=40, t=40, b=40),

xaxis_showgrid=False,

xaxis_side="top",

xaxis_tickangle=-45,

xaxis_tickfont = {

"family": "var(--bs-font-monospace)",

"color": "#fff"

},

yaxis_showgrid=False,

yaxis_title="Locale",

yaxis_autorange="reversed",

)

div = plot(fig, output_type="div", include_plotlyjs=True)

return [nodes.raw("", div, format="html")]

here ya go, here's heatmap. works in light and dark mode. i usd the blue/pink color scale because a) it's cute and b) fits in with the rest of the colors and c) it wasn't altogether obvious to me that yellow means completed :)

only thing i had to do that isn't here is add this to pyos.css since you can use css vars in some places but not others for some reason.

.plotly svg { text { fill: var(--pst-color-text-base) !important; } }

other notes:

flattened out nested iteration by separating data cleaning steps from plot creation steps

use a hovertemplate and customdata also to separate data cleaning from plotting logic

for some reason all plotting libraries want to make the default style very ugly with a bunch of gridlines, shaded backgrounds, and so on. so i removed all the unnecessary stuff so it blended into the page.

make colorbar on bottom, horizontally, so it looks progressbarlike.

we should also probably rescale the colorbar so the difference between "100%" vs "not 100%" is clearer, or we can use a different one, idc. we don't need precise number readout for a plot like this, the purpose i think is just to communicate "which languages are mostly done vs not mostly done and which pages need to be worked on"

sneakers-the-rat · 2025-05-30T01:19:39Z

I guess the last thing to do would be accessibility. since we already have all the data when generating the plot, and we're generating an SVG, we might as well add alt or aria-label or whatever is appropriate for the cells.

sneakers-the-rat · 2025-05-30T01:21:34Z

_ext/translation_graph.py

+from plotly.offline import plot
+


Suggested change

from plotly.offline import plot

from plotly.offline import plot

import numpy as np

oops, forgot that i added this (though this is just trying to follow the docs where it says it must be an array, but i bet it would work fine just as a list of lists)

Co-authored-by: Jonny Saunders <[email protected]>

RobPasMue · 2025-05-30T06:08:23Z

Thank you for the heatmap implementation @sneakers-the-rat! If y'all like it the way it's rendering (@flpm, @lwasser) I can add the suggestions to the code and then all solved.

I will work on the auto-generation of the JSON info as part of #493 (comment) -- we can probably move away from having the persistent file and generate it on the fly. Although it would imply that everytime someone wants to build the docs, they have to go through that step... which might be unnecessary. If we have a workflow that updates it regularly, I think we solve our problem -- but I am up for discussion.

Bottom line: I can have it implemented either way - either through a Sphinx hook at build time or as a GitHub actions workflow that updates it on a scheduled basis.

sneakers-the-rat · 2025-05-30T08:03:12Z

If we have a workflow that updates it regularly, I think we solve our problem

to be clear i'm good with whatever does something in this neighborhood! just as long as we avoid forgetting something and eventually realizing the wonderful colorful square box has been broken for awhile. cosplaying as ornery reviewer who refuses all nice tooling and docs and rehearses the edge cases.

Also sorry for making a code suggestion that was just like "here's a whole different thing," that is pretty rude of me. I was just having an afternoon where i needed a bit of time to distract myself and you had set up this great canvas in these two PRs, so thanks for that! you can feel free to take or leave any part of that.

And order the languages from most done to least done

rats, i did forget this. i like nice color sorting and it does make sense. it induces a tiny amount of gamification (the translation scoreboard!) which i think could be both cute and useful as a way of knowing where to target effort. there is a maybe remote chance that it makes someone feel bad or unwittingly participate in linguistic-cultural rivalries. So weight that as an "i'm aware of this being possible and think it's worth raising but have no estimate of either likelihood or magnitude" vs a higher probability of useful information and tidier color gradients.

~ yielding da floor ~ thanks for ur patience

RobPasMue · 2025-05-30T08:36:29Z

Also sorry for making a code suggestion that was just like "here's a whole different thing," that is pretty rude of me. I was just having an afternoon where i needed a bit of time to distract myself and you had set up this great canvas in these two PRs, so thanks for that! you can feel free to take or leave any part of that.

Oh no, never apologize for that! I really appreciate that you took the time to come up with a whole new implementation of the heatmap! I'm really glad it helped you distract yourself. Coding is the perfect way for it! =) It also helped me on understanding how to do it with a heatmap too! =)

to be clear i'm good with whatever does something in this neighborhood! just as long as we avoid forgetting something and eventually realizing the wonderful colorful square box has been broken for awhile. cosplaying as ornery reviewer who refuses all nice tooling and docs and rehearses the edge cases.

I fully agree - I'm fine with either option too! As long as the data keeps getting updated haha. Also suffered from this in the past.. so I know the feeling.

And order the languages from most done to least done

Regarding this last point - I understand all points as well. I just went on default sorting (alphabetical) but sorting based on completeness makes sense as well to reach out for help on those languages that are not fully there yet. But I'll let @lwasser comment on this last point too =)

and @sneakers-the-rat .. thanks for your thorough review. Once again, I really appreciate your efforts to read it through, provide code suggestions and help out on the implementation! =)

flpm · 2025-05-30T17:54:10Z

I think it looks awesome! It will allow to put many languages in relatively small screen space.

I think we should add the percentage in the cell. In addition to the "identify where the bigger gaps are" use case, there is also keeping the translation up to date as it ages, "spot where new gaps are appearing".

When the English text evolves more and more entries will start to be marked fuzzy and the 100% will drift down to 99%, 98% etc. Without the numbers it will be hard to spot the small changes in the color.

Reviewing fuzzy entries in the .PO file is a very easy task too, so this will help new contributors find those opportunities.

lwasser · 2025-06-04T23:00:34Z

Hi Friends!! Gosh I love how this has evolved and all of the discussion. Thank you for all of the work on this
I have a few cents of comments to add ✨

Here is our color palette:

What if we used a categorical color approach rather than a linear gradient? And then we had categories like:
0-25% complete: (white or yellow which might look white to some, will stand out well and will look "empty") help!
partially done 25-75% done (light magenta -- getting there!)
fuzzy / almost there - light green (so close)
complete! Green DONESO (we could make the green darker too to make the light green less similar visually for accessibility

That might make it easier to figure out what is done quickly.
We can also add numbers to each box for additional clarity, which is a fantastic idea, Felipe!
I just wonder if bins will be visually easier to quickly understand where we need help (with a prominent legend next to it)

I'm not married to these colors, but I do like green == done and simplifying the legend (which I think Jonny alluded to above! And some version of white / light color to signify empty / help.

The light green and dark green might be too similar in my example, but I'm just thinking there is value in someone being able to glance and see what areas need help quickly!

docs: adding translation stats to docs

e018c21

RobPasMue mentioned this pull request May 23, 2025

enh: create a graphic or visual that helps us visualize % translated for each section of the guide by language #493

Open

5 tasks

RobPasMue commented May 23, 2025

View reviewed changes

_ext/translation_graph.py Outdated Show resolved Hide resolved

RobPasMue added 2 commits May 23, 2025 14:09

Update _ext/translation_graph.py

f2ce21d

chore: change single quotes for double quotes

a3567ca

lwasser requested review from flpm and sneakers-the-rat May 28, 2025 16:20

lwasser reviewed May 28, 2025

View reviewed changes

lwasser requested changes May 28, 2025

View reviewed changes

lwasser added this to pyOpenSci sprint events 2023-2025 May 28, 2025

lwasser moved this to pyconus-24 in pyOpenSci sprint events 2023-2025 May 28, 2025

lwasser moved this from pyconus-24 to pyconus-25 in pyOpenSci sprint events 2023-2025 May 28, 2025

flpm reviewed May 29, 2025

View reviewed changes

Merge branch 'main' into feat/translation-graph-render

3dfe760

sneakers-the-rat approved these changes May 30, 2025

View reviewed changes

sneakers-the-rat reviewed May 30, 2025

View reviewed changes

Update _ext/translation_graph.py

b3e26d8

Co-authored-by: Jonny Saunders <[email protected]>


		```{translation-graph}
		```


		```{translation-graph}
		```

docs: adding translation stats to docs #511

Are you sure you want to change the base?

docs: adding translation stats to docs #511

Conversation

RobPasMue commented May 23, 2025

Uh oh!

RobPasMue commented May 23, 2025

Uh oh!

RobPasMue commented May 23, 2025

Uh oh!

Uh oh!

lwasser commented May 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sneakers-the-rat May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lwasser left a comment

Choose a reason for hiding this comment

Uh oh!

RobPasMue commented May 29, 2025

Uh oh!

flpm commented May 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lwasser commented May 29, 2025

Uh oh!

lwasser commented May 29, 2025

Uh oh!

sneakers-the-rat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sneakers-the-rat commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RobPasMue commented May 30, 2025

Uh oh!

sneakers-the-rat commented May 30, 2025

Uh oh!

RobPasMue commented May 30, 2025

Uh oh!

flpm commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lwasser commented Jun 4, 2025

Uh oh!

Uh oh!

sneakers-the-rat May 29, 2025 •

edited

Loading

flpm commented May 30, 2025 •

edited

Loading